Start: 19 Mar 2018.

Last update: 28 Oct 2020

Using Srec yearlings + male juveniles.

Max. nonlinear number of knots = 1 (df = 2).

Using cases/N for all sex/age classes.



What drives lepto outbreaks?

Statistical comparison of models composed of different combinations of qualitatively chosen variables.
Main goal: assess relative importance of intrinsic (susceptibles) vs extrinsic (environment) drivers.

Approach:
- Comparison of GAMs using a combination of “feature selection” and adjusted R^2.
- (add likelihood ratio tests?) - Random Forests or Boosted Regression Trees - Hierarchical Bayesian model will be attempted. If successful, this would give more refined information about where (susceptibles, transmission/contacts, susceptibility) environment acts most strongly.

Model comparison of GAMS is done using hv-block crossvalidation, for models containing up to 4 variables, assessing all combinations.
Model stats are:
- Adjusted R2
- Feature mismatching (number of mismatching outbreak categories).
Ranking is done based on the combined ranks of these two statistics:
Each model is ranked according to each statistic, resulting in 2 ranking values where 1 is best.
These values are summed for each model, and this combined value is used for final ranking.
Ties receive the same ranking value, and the next ranking model receives the next ranking value.
E.g. if 4 models have the same ranking, they als receive value 1. Model 5, the next best ranking, then gets value 2 (as opposed to value 5 because it is the fifth model). This ensures that small differences between values are not penalized too much.
Before ranking, all values are rounded to two digits behind the comma (or whatever this is called).

Lasso regression, random forests and boosted regression trees are used as a parallel approach, to see whether the same variables are selected.





Dataset:
- Cases without male subadult/adults
- Susceptible reconstruction, scale 100, yearlings (1y old) + male juveniles (2+3y old)
- SST south, months 6-7, 8-10 (means)
- SST central, months 6-7 and 8-10 (means)
- SST north, months 8-10 (mean)
- Spring transition at 36N
- Spring transition at 39N
- Spring transition at 45N
- Upwelling south (36N), months 8-10 (mean)
- Upwelling central (39N), months 8-10 (mean)
- Pup survival as proxy for conditions in south - Yearling survival as proxy for conditions in south & center



Nonlinearity selection



variable dfs
Srec.yearlings.male.juv 1
Pup.surv 1
Yearling.surv 1
ST36 1
ST39 1
ST45 1
SSTsouth.6to7 1
SSTsouth.8to10 1
SSTcentral.6to7 1
SSTcentral.8to10 1
SSTnorth.8to10 1
Upw.south.8to10 2
Upw.central.8to10 1



Model selection




Top 20 models:



parms Combined.rank AdjR2.cv AdjR2.full FMM.cv FMM.full AIC.full deltaAIC Pval AdjR2.rank FMM.rank prop.dev.explained
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 - Upw.south.8to10 1 0.18 0.41 12 11 -378.2 -0.1 0.000039 1 1 0.496
Srec.yearlings.male.juv - ST36 2 0.15 0.28 16 15 -373.5 -4.8 0.000234 2 4 0.332
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 2 0.13 0.31 15 14 -374.2 -4.1 0.000170 3 3 0.388
Srec.yearlings.male.juv - Yearling.surv - ST36 - SSTcentral.6to7 2 0.12 0.41 14 11 -378.3 0.0 0.000015 4 2 0.498
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 - SSTcentral.8to10 3 0.09 0.30 14 14 -372.9 -5.4 0.000535 6 2 0.401
Srec.yearlings.male.juv - ST36 - ST39 - SSTcentral.6to7 4 0.11 0.35 16 13 -375.4 -2.9 0.000223 5 4 0.447
Srec.yearlings.male.juv - ST36 - ST39 5 0.06 0.31 15 12 -374.0 -4.3 0.000664 7 3 0.383
Srec.yearlings.male.juv - SSTcentral.6to7 6 0.11 0.21 18 18 -370.5 -7.8 0.003708 5 6 0.265
Srec.yearlings.male.juv - Yearling.surv - ST36 6 0.09 0.33 17 12 -375.0 -3.3 0.000068 6 5 0.403
Srec.yearlings.male.juv - ST36 - Upw.central.8to10 6 0.06 0.25 16 15 -371.6 -6.7 0.001136 7 4 0.333
Srec.yearlings.male.juv - SSTsouth.8to10 - SSTcentral.6to7 6 0.09 0.21 17 15 -369.8 -8.5 0.007660 6 5 0.295
Srec.yearlings.male.juv - ST36 - ST45 7 0.03 0.29 14 12 -373.2 -5.1 0.000284 10 2 0.366
Srec.yearlings.male.juv - Yearling.surv - ST36 - Upw.central.8to10 7 0.04 0.31 15 11 -373.4 -4.9 0.000295 9 3 0.410
Srec.yearlings.male.juv 8 0.09 0.19 19 16 -370.6 -7.7 0.002835 6 7 0.218
SSTsouth.8to10 - SSTcentral.6to7 8 0.09 0.15 19 19 -368.2 -10.1 0.026936 6 7 0.209
Srec.yearlings.male.juv - ST36 - Upw.south.8to10 8 0.05 0.34 17 12 -375.3 -3.0 0.000301 8 5 0.408
Srec.yearlings.male.juv - SSTcentral.6to7 - Upw.central.8to10 8 0.06 0.18 18 18 -368.6 -9.7 0.012159 7 6 0.266
Srec.yearlings.male.juv - ST36 - SSTsouth.8to10 - SSTcentral.6to7 8 0.03 0.31 15 13 -373.4 -4.9 0.000640 10 3 0.410
Srec.yearlings.male.juv - SSTsouth.8to10 - SSTcentral.6to7 - SSTnorth.8to10 8 0.05 0.21 17 14 -369.0 -9.3 0.014641 8 5 0.322
SSTsouth.8to10 - SSTcentral.6to7 - SSTnorth.8to10 9 0.09 0.19 20 19 -368.8 -9.5 0.020712 6 8 0.273
Srec.yearlings.male.juv 9 0.03 0.19 16 16 -370.6 -7.7 0.002835 10 4 0.218



How often does each variable occur in the top 20 models?


variable top20.freq
Srec.yearlings.male.juv 18
ST36 13
SSTcentral.6to7 12
SSTsouth.8to10 5
Yearling.surv 3
Upw.central.8to10 3
ST39 2
SSTnorth.8to10 2
Upw.south.8to10 2
ST45 1
SSTcentral.8to10 1
Pup.surv 0
SSTsouth.6to7 0



What is the relative weight of each variable in the top 50 models, based on combined ranking?


variable weight
Srec.yearlings.male.juv 0.28
ST36 0.21
SSTcentral.6to7 0.19
SSTsouth.8to10 0.06
Upw.central.8to10 0.06
Yearling.surv 0.04
ST39 0.04
ST45 0.03
SSTnorth.8to10 0.03
Upw.south.8to10 0.03
SSTcentral.8to10 0.02
Pup.surv 0.01
SSTsouth.6to7 0.00



What are the Akaike weights of the top 20 models?


Akaike weights sum to 1 ==> gives relative importance of each model, compared to the others.


parms Combined.rank AdjR2.cv AdjR2.full FMM.cv FMM.full AIC.full deltaAIC Pval AdjR2.rank FMM.rank prop.dev.explained Akaike.weight
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 - Upw.south.8to10 1 0.18 0.41 12 11 -378.2 -0.1 0.000039 1 1 0.496 0.28
Srec.yearlings.male.juv - ST36 2 0.15 0.28 16 15 -373.5 -4.8 0.000234 2 4 0.332 0.03
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 2 0.13 0.31 15 14 -374.2 -4.1 0.000170 3 3 0.388 0.04
Srec.yearlings.male.juv - Yearling.surv - ST36 - SSTcentral.6to7 2 0.12 0.41 14 11 -378.3 0.0 0.000015 4 2 0.498 0.29
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 - SSTcentral.8to10 3 0.09 0.30 14 14 -372.9 -5.4 0.000535 6 2 0.401 0.02
Srec.yearlings.male.juv - ST36 - ST39 - SSTcentral.6to7 4 0.11 0.35 16 13 -375.4 -2.9 0.000223 5 4 0.447 0.07
Srec.yearlings.male.juv - ST36 - ST39 5 0.06 0.31 15 12 -374.0 -4.3 0.000664 7 3 0.383 0.03
Srec.yearlings.male.juv - SSTcentral.6to7 6 0.11 0.21 18 18 -370.5 -7.8 0.003708 5 6 0.265 0.01
Srec.yearlings.male.juv - Yearling.surv - ST36 6 0.09 0.33 17 12 -375.0 -3.3 0.000068 6 5 0.403 0.06
Srec.yearlings.male.juv - ST36 - Upw.central.8to10 6 0.06 0.25 16 15 -371.6 -6.7 0.001136 7 4 0.333 0.01
Srec.yearlings.male.juv - SSTsouth.8to10 - SSTcentral.6to7 6 0.09 0.21 17 15 -369.8 -8.5 0.007660 6 5 0.295 0.00
Srec.yearlings.male.juv - ST36 - ST45 7 0.03 0.29 14 12 -373.2 -5.1 0.000284 10 2 0.366 0.02
Srec.yearlings.male.juv - Yearling.surv - ST36 - Upw.central.8to10 7 0.04 0.31 15 11 -373.4 -4.9 0.000295 9 3 0.410 0.03
Srec.yearlings.male.juv 8 0.09 0.19 19 16 -370.6 -7.7 0.002835 6 7 0.218 0.01
SSTsouth.8to10 - SSTcentral.6to7 8 0.09 0.15 19 19 -368.2 -10.1 0.026936 6 7 0.209 0.00
Srec.yearlings.male.juv - ST36 - Upw.south.8to10 8 0.05 0.34 17 12 -375.3 -3.0 0.000301 8 5 0.408 0.07
Srec.yearlings.male.juv - SSTcentral.6to7 - Upw.central.8to10 8 0.06 0.18 18 18 -368.6 -9.7 0.012159 7 6 0.266 0.00
Srec.yearlings.male.juv - ST36 - SSTsouth.8to10 - SSTcentral.6to7 8 0.03 0.31 15 13 -373.4 -4.9 0.000640 10 3 0.410 0.03
Srec.yearlings.male.juv - SSTsouth.8to10 - SSTcentral.6to7 - SSTnorth.8to10 8 0.05 0.21 17 14 -369.0 -9.3 0.014641 8 5 0.322 0.00
SSTsouth.8to10 - SSTcentral.6to7 - SSTnorth.8to10 9 0.09 0.19 20 19 -368.8 -9.5 0.020712 6 8 0.273 0.00
Srec.yearlings.male.juv 9 0.03 0.19 16 16 -370.6 -7.7 0.002835 10 4 0.218 0.01



What is the relative weight of each variable in the top 20 models, based on full-model Akaike weights?


Variable importance is calculated as the sum of the Akaike weights of each model that variable appears in.
Table column “akaike.importance” is the sum of the Akaike weights of each model a certain variable appears in (so a weight of 1 means that it appears in all models).
Table column “relative.importance” is the akaike importance divided by the sum of all akaike importance values, and can be interpreted as a relative measure of importance.

Variable Akaike.importance Relative.importance
Srec.yearlings.male.juv 0.99 0.27
ST36 0.97 0.27
SSTcentral.6to7 0.73 0.20
Yearling.surv 0.38 0.10
Upw.south.8to10 0.35 0.10
ST39 0.10 0.03
Upw.central.8to10 0.04 0.01
SSTsouth.8to10 0.03 0.01
ST45 0.02 0.01
SSTcentral.8to10 0.02 0.01
Pup.surv 0.00 0.00
SSTsouth.6to7 0.00 0.00
SSTnorth.8to10 0.00 0.00



Best 3-variable models:


parms AdjR2.cv FeatureMismatch.cv Combined.rank AdjR2.rank FMM.rank
Srec.yearlings.male.juv - ST36 - SSTcentral.6to7 0.132 15 2 3 3
Srec.yearlings.male.juv - ST36 - ST39 0.055 15 5 7 3
Srec.yearlings.male.juv - Yearling.surv - ST36 0.086 17 6 6 5
Srec.yearlings.male.juv - ST36 - Upw.central.8to10 0.056 16 6 7 4
Srec.yearlings.male.juv - SSTsouth.8to10 - SSTcentral.6to7 0.086 17 6 6 5



Best 2-variable:


parms AdjR2.cv FeatureMismatch.cv Combined.rank AdjR2.rank FMM.rank
Srec.yearlings.male.juv - ST36 0.149 16 2 2 4
Srec.yearlings.male.juv - SSTcentral.6to7 0.113 18 6 5 6
SSTsouth.8to10 - SSTcentral.6to7 0.088 19 8 6 7
Srec.yearlings.male.juv - Upw.central.8to10 0.032 17 10 10 5
Srec.yearlings.male.juv - SSTnorth.8to10 -0.012 19 15 14 7



Best 1-variable:


parms AdjR2.cv FeatureMismatch.cv Combined.rank AdjR2.rank FMM.rank
Srec.yearlings.male.juv 0.089 19 8 6 7
SSTcentral.6to7 -0.014 25 21 14 13
ST36 -0.028 24 22 16 12
SSTnorth.8to10 -0.056 22 23 19 10
Upw.central.8to10 -0.063 22 23 19 10
ST45 -0.062 23 24 19 11
Yearling.surv -0.077 22 25 21 10
Pup.surv -0.081 24 27 21 12
ST39 -0.098 22 27 23 10
SSTsouth.8to10 -0.108 22 28 24 10






What is the relative contribution of each variable in the top model?


The top model is recreated using a glm (with a quadratic term for upwelling), with normalized variables so that the effect estimates represent the relative contribution of each variable.

(not sure how to interpret/compare the quadratic term here)


Coefficients:


##                    (Intercept) scale(Srec.yearlings.male.juv) 
##                   0.0006878489                   1.6536107941 
##                    scale(ST36)         scale(SSTcentral.6to7) 
##                   1.3642161393                   0.8125517562 
##         scale(Upw.south.8to10)  I((scale(Upw.south.8to10))^2) 
##                   0.8559189110                   0.8726303863




Susceptible estimates compared



parms Combined.rank AdjR2.cv AdjR2.full FMM.cv FMM.full AIC.full deltaAIC Pval AdjR2.rank FMM.rank prop.dev.expl Akaike.weight
Srec.y.mj 1 0.09 0.20 19 16 -391.1 0.0 0.002428 1 1 0.224 0.51
Srec.y.mj.fs 2 -0.03 0.14 21 17 -388.9 -2.2 0.015818 2 2 0.167 0.17
Srec.y.fs.fa 3 -0.04 -0.03 22 20 -383.3 -7.8 0.608846 3 3 0.009 0.01
Srec.y.mj.fs.fa 4 -0.12 0.05 22 17 -385.8 -5.3 0.108769 5 3 0.082 0.04
Srec.y.mj.ms.ma.fs.fa 5 -0.12 0.01 24 20 -384.6 -6.5 0.225313 4 5 0.048 0.02
Srec.y.mj.ms.fs 6 -0.15 0.08 23 18 -386.8 -4.3 0.056311 6 4 0.110 0.06
Srec.y.mj.ms.fs.fa 7 -0.16 0.05 23 20 -385.9 -5.2 0.095307 7 4 0.086 0.04
Srec.y.mj.ms 8 -0.17 0.12 23 16 -388.1 -3.0 0.021538 8 4 0.147 0.11
Srec.y.mj.ms.ma 8 -0.16 0.05 26 17 -385.7 -5.4 0.111142 6 6 0.079 0.03
Srec.y.fs 9 -0.21 0.00 24 22 -384.2 -6.9 0.289152 9 5 0.036 0.02




Stats plots for the top models:



Feature mismatch difference of the 20 best models




AdjR2 of the 20 best models





Figures of predicted cases for the top models, full dataset (no crossval):



## [1] "Model 1:     Rank = 1,     AdjR2.CV = 0.184,    Feature Match = 12"
## [1] "Srec.yearlings.male.juv" "ST36"                   
## [3] "SSTcentral.6to7"         "Upw.south.8to10"
## [1] "              "
## [1] "Model 2:     Rank = 2,     AdjR2.CV = 0.149,    Feature Match = 16"
## [1] "Srec.yearlings.male.juv" "ST36"
## [1] "              "



Effect functions for GAM1, for publication:



Effect functions for GAM1, for publication:

Using the same y axis ranges for all figures, and moving the title to the bottom



## [1] "Model 3:     Rank = 2,     AdjR2.CV = 0.132,    Feature Match = 15"
## [1] "Srec.yearlings.male.juv" "ST36"                   
## [3] "SSTcentral.6to7"
## [1] "              "
## [1] "Model 4:     Rank = 2,     AdjR2.CV = 0.125,    Feature Match = 14"
## [1] "Srec.yearlings.male.juv" "Yearling.surv"          
## [3] "ST36"                    "SSTcentral.6to7"
## [1] "              "

## [1] "Model 5:     Rank = 3,     AdjR2.CV = 0.086,    Feature Match = 14"
## [1] "Srec.yearlings.male.juv" "ST36"                   
## [3] "SSTcentral.6to7"         "SSTcentral.8to10"
## [1] "              "
## [1] "Model 6:     Rank = 4,     AdjR2.CV = 0.109,    Feature Match = 16"
## [1] "Srec.yearlings.male.juv" "ST36"                   
## [3] "ST39"                    "SSTcentral.6to7"
## [1] "              "

## [1] "Model 7:     Rank = 5,     AdjR2.CV = 0.055,    Feature Match = 15"
## [1] "Srec.yearlings.male.juv" "ST36"                   
## [3] "ST39"
## [1] "              "
## [1] "Model 8:     Rank = 6,     AdjR2.CV = 0.113,    Feature Match = 18"
## [1] "Srec.yearlings.male.juv" "SSTcentral.6to7"
## [1] "              "

## [1] "Model 9:     Rank = 6,     AdjR2.CV = 0.086,    Feature Match = 17"
## [1] "Srec.yearlings.male.juv" "Yearling.surv"          
## [3] "ST36"
## [1] "              "
## [1] "Model 10:     Rank = 6,     AdjR2.CV = 0.056,    Feature Match = 16"
## [1] "Srec.yearlings.male.juv" "ST36"                   
## [3] "Upw.central.8to10"
## [1] "              "



Susceptibles only model:





Environment only:



## [1] "Model 17:     Rank = 8,     AdjR2.CV = 0.062,    Feature Match = 18"
## [1] "Srec.yearlings.male.juv" "SSTcentral.6to7"        
## [3] "Upw.central.8to10"
## [1] "              "



Top model, with susceptibles.



Predicted cases using model averaging.


The predictions of the top 20 models are averages according to Akaike weights.





Model averaging stats:


Adjusted R2: 0.4173168
Feature mismatches: 12
% deviance explained: 0.5005573









Smoothed vs observed susceptibles


Srec.method FMM AdjR2 Prop.dev.explained AIC
S.smooth 17 0.15 0.187 -369.3
S.observed 16 0.18 0.218 -370.6







Contributions of different drivers to outbreaks:











Smoothed vs observed susceptibles, 1984 to 2006, for comparison with Kim Jamie results

Srec.method FMM AdjR2 Prop.dev.explained AIC
S.smooth 16 0.40 0.278 -291.2
S.observed 15 0.43 0.313 -292.4

Raw data selected variables



Cases / N




Cases / N monthly


##      Var1 Freq
## 1 1984-02    1
## 2 1984-05    1
## 3 1984-07   11
## 4 1984-08   36
## 5 1984-09   27
## 6 1984-10   19





Population size and survival


##   year        N
## 1 1984 66388.13
## 2 1985 64430.63
## 3 1986 66599.83
## 4 1987 70652.75
## 5 1988 76592.43
## 6 1989 82278.99



S rec yearlings + male juveniles





Figures of selected variables vs outbreak category



Susceptibles



Sea surface temperature



Sea surface temperature south June-July




Sea surface temperature central June-July




Upwelling index

Upwelling index south Aug-Sep-Oct




Upwelling index central Aug-Sep-Oct




Spring Transition



Spring transition south




Spring transition central




Spring transition north




Predict cases up to 2014



(slightly different values because scaling is now done for some variables on the 1984 - 2014 data instead of the 1984 - 2012 data)




Figures of S vs top environmental variables



Outbreak size in susceptible vs environmental component space.

Outbreak size (circle size) = cases/N.
Outbreak category:
- no outbreak < 0.0004 cases/N
- outbreak > 0.0004 cases/N
(this corresponds with the existing outbreak categories, where no outbreak is the same category, while outbreak is medium + large outbreaks.)

Susceptibles = susceptible anomaly.
Environmental component = anomaly of the linear combination of the environmental variables, with coefficients determined by model fit.




Outbreak size in susceptible vs environmental component space








Same data but separate time series:







Outbreak size in susceptibles anomaly vs anomaly of the spring transition south component of the top model.




Outbreak size in susceptibles anomaly vs anomaly of the SST central (June-July) component of the top model.




Outbreak size in susceptibles anomaly vs Upwelling south (Aug-Oct) anomaly.




What is the correlation between environmental variables and susceptibles, and between environmental variables and survival?

We already know that survival correlates with SST, from DeLong et al 2017.
But they have not used the exact same variables we did, so in order to assess how the top environmental variables that were selected in the lepto outbreak size model correlate with survival, we here do a similar analysis.

We also do this for the correlation between reconstructed susceptibles and environmental variables.

Only the three environmental variables in the top models are considered:
- ST36 - SST central Jun-Jul - Upwelling south Aug-Oct

As for the cases models, we first select the best degree of nonlinearity (max 1 knot), but now using AIC.



Susceptibles vs environmental variables.


## 
## Call:
## lm(formula = Srec.yearlings.male.juv ~ ST36, data = d1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3647 -0.4086  0.1339  0.7385  1.3113 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.003922   0.185726   0.021    0.983
## ST36        -0.190046   0.189735  -1.002    0.325
## 
## Residual standard error: 0.9999 on 27 degrees of freedom
## Multiple R-squared:  0.03583,    Adjusted R-squared:  0.0001174 
## F-statistic: 1.003 on 1 and 27 DF,  p-value: 0.3254
## Analysis of Variance Table
## 
## Response: Srec.yearlings.male.juv
##           Df  Sum Sq Mean Sq F value Pr(>F)
## ST36       1  1.0032 1.00317  1.0033 0.3254
## Residuals 27 26.9968 0.99988
## 
## Call:
## lm(formula = Srec.yearlings.male.juv ~ SSTcentral.6to7, data = d1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4117 -0.5908  0.1693  0.6589  1.4948 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)     -0.02785    0.17867  -0.156   0.8773  
## SSTcentral.6to7 -0.46318    0.24902  -1.860   0.0738 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9588 on 27 degrees of freedom
## Multiple R-squared:  0.1136, Adjusted R-squared:  0.08075 
## F-statistic:  3.46 on 1 and 27 DF,  p-value: 0.07381
## Analysis of Variance Table
## 
## Response: Srec.yearlings.male.juv
##                 Df  Sum Sq Mean Sq F value  Pr(>F)  
## SSTcentral.6to7  1  3.1802  3.1802  3.4596 0.07381 .
## Residuals       27 24.8198  0.9193                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## lm(formula = Srec.yearlings.male.juv ~ Upw.south.8to10, data = d1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2078 -0.6201  0.1837  0.9015  1.4348 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -1.617e-16  1.890e-01   0.000    1.000
## Upw.south.8to10 -3.886e-02  1.923e-01  -0.202    0.841
## 
## Residual standard error: 1.018 on 27 degrees of freedom
## Multiple R-squared:  0.00151,    Adjusted R-squared:  -0.03547 
## F-statistic: 0.04083 on 1 and 27 DF,  p-value: 0.8414
## Analysis of Variance Table
## 
## Response: Srec.yearlings.male.juv
##                 Df  Sum Sq Mean Sq F value Pr(>F)
## Upw.south.8to10  1  0.0423 0.04228  0.0408 0.8414
## Residuals       27 27.9577 1.03547



Yearling survival vs environmental variables.


## Analysis of Deviance Table
## 
## Model 1: Yearling.surv ~ 1 + ST36
## Model 2: Yearling.surv ~ 1
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        27      1.618                     
## 2        28      2.016 -1 -0.39798   0.5281
## Analysis of Deviance Table
## 
## Model 1: Yearling.surv ~ 1 + SSTcentral.6to7
## Model 2: Yearling.surv ~ 1
##   Resid. Df Resid. Dev Df  Deviance Pr(>Chi)
## 1        27     1.9394                      
## 2        28     2.0160 -1 -0.076633   0.7819
## Analysis of Deviance Table
## 
## Model 1: Yearling.surv ~ 1 + Upw.south.8to10
## Model 2: Yearling.surv ~ 1
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        27      1.878                     
## 2        28      2.016 -1 -0.13795   0.7103



# Susceptibles vs cases


S vs cases vs time 3D

S vs cases vs time all data

Susceptibles vs cases vs time with environment as colours

Environmental model component = colour

Green = low, red = high.

! Note: axis Cases/N goes from large to small in the default 3D plot setting



Spring transition = colour

Green = low, red = high.

! Note: axis Cases/N goes from large to small in the default 3D plot setting



SST = colour

Green = low, red = high.

! Note: axis Cases/N goes from large to small in the default 3D plot setting



Upwelling = colour

Green = low, red = high.
! Note: axis Cases/N goes from large to small in the default 3D plot setting



Cases of the current year vs susceptibles of the next year.





Epidemic growth rates vs population size.


Exponential growth rates calculated for outbreak years.
Using Poisson regression to estimate the exponential growth rate.
Fitted time periods for each year = beginning of June until the beginning of the month with the maximum number of cases (the peak month).


There is no significant correlation between population size and growth rate (pos or neg).


## Analysis of Variance Table
## 
## Response: growth.rate$lambda
##               Df   Sum Sq  Mean Sq F value Pr(>F)
## growth.rate$N  1 0.046913 0.046913  1.8793 0.1977
## Residuals     11 0.274596 0.024963



Lasso regression assuming binomial distribution

Considering the fact that none of the distributions result in good residuals, and those for binomial and gamma-log families are the same, it might be informative to do lassa regression assuming a binomial distribution.

## [1] "(Intercept)"             "Srec.yearlings.male.juv"
## [3] "ST36"
## [1] "(Intercept)"             "Srec.yearlings.male.juv"
## [3] "Pup.surv"                "ST36"
## [1] "(Intercept)"             "Srec.yearlings.male.juv"
## [3] "Pup.surv"                "ST36"                   
## [5] "ST45"                    "SSTsouth.8to10"
## [1] "(Intercept)"             "Srec.yearlings.male.juv"
## [3] "Pup.surv"                "Yearling.surv"          
## [5] "ST36"                    "ST45"                   
## [7] "SSTsouth.8to10"



Random forests




Boosted regression trees


##                                             var    rel.inf
## Srec.yearlings.male.juv Srec.yearlings.male.juv 24.2060898
## Upw.central.8to10             Upw.central.8to10 20.1672833
## ST36                                       ST36 18.5638816
## Yearling.surv                     Yearling.surv 13.6927443
## SSTcentral.6to7                 SSTcentral.6to7  8.2878982
## SSTsouth.6to7                     SSTsouth.6to7  4.2126872
## Pup.surv                               Pup.surv  4.1833724
## ST45                                       ST45  1.8401817
## SSTsouth.8to10                   SSTsouth.8to10  1.4135953
## ST39                                       ST39  1.3685602
## Upw.south.8to10                 Upw.south.8to10  1.3183986
## SSTnorth.8to10                   SSTnorth.8to10  0.7453075
## SSTcentral.8to10               SSTcentral.8to10  0.0000000




Which density distribution to use?


None of the density distributions below result in good residuals.
Gamma log results in stronger correlations.
No difference in AIC between different link functions for Gamma distribution.
==> I think we’re good with Gamma log.


## 
## Call:
## glm(formula = caseN ~ Srec.yearlings.male.juv + SSTsouth.8to10, 
##     family = "binomial", data = d1)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.034578  -0.015572  -0.002384   0.008084   0.040742  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)              -7.3118     7.6002  -0.962    0.336
## Srec.yearlings.male.juv   0.4328     8.3597   0.052    0.959
## SSTsouth.8to10            0.1334     9.0662   0.015    0.988
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0132854  on 28  degrees of freedom
## Residual deviance: 0.0099502  on 26  degrees of freedom
## AIC: 6.0415
## 
## Number of Fisher Scoring iterations: 10
## 
## Call:
## glm(formula = caseN ~ Srec.yearlings.male.juv + SSTsouth.8to10, 
##     family = Gamma(link = "log"), data = d1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4401  -0.8183  -0.1542   0.3873   1.3518  
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -7.31778    0.13846 -52.852  < 2e-16 ***
## Srec.yearlings.male.juv  0.40896    0.13869   2.949  0.00666 ** 
## SSTsouth.8to10           0.02706    0.18324   0.148  0.88374    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.5379045)
## 
##     Null deviance: 20.981  on 28  degrees of freedom
## Residual deviance: 16.391  on 26  degrees of freedom
## AIC: -364.58
## 
## Number of Fisher Scoring iterations: 6
## 
## Call:
## glm(formula = caseN ~ Srec.yearlings.male.juv + SSTsouth.8to10, 
##     family = Gamma(link = "inverse"), data = d1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4357  -0.7443  -0.1609   0.2296   1.4716  
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               1629.0      248.4   6.559 5.92e-07 ***
## Srec.yearlings.male.juv   -652.4      260.9  -2.501    0.019 *  
## SSTsouth.8to10            -102.9      218.4  -0.471    0.642    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.5807588)
## 
##     Null deviance: 20.981  on 28  degrees of freedom
## Residual deviance: 16.161  on 26  degrees of freedom
## AIC: -365.03
## 
## Number of Fisher Scoring iterations: 6
## 
## Call:
## glm(formula = caseN ~ Srec.yearlings.male.juv + SSTsouth.8to10, 
##     family = Gamma(link = "identity"), data = d1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5192  -0.8636  -0.1328   0.4677   1.2175  
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              7.063e-04  9.772e-05   7.228 1.12e-07 ***
## Srec.yearlings.male.juv  2.423e-04  6.959e-05   3.482  0.00178 ** 
## SSTsouth.8to10          -6.787e-05  9.514e-05  -0.713  0.48199    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.5053988)
## 
##     Null deviance: 20.981  on 28  degrees of freedom
## Residual deviance: 16.580  on 26  degrees of freedom
## AIC: -364.22
## 
## Number of Fisher Scoring iterations: 25